Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modified Upgrader11to12.upgradeZookeeper to be idempotent #5349

Open
wants to merge 1 commit into
base: 3.1
Choose a base branch
from

Conversation

dlmarion
Copy link
Contributor

The namespace modifications that happen in this method can only be run once. On the second invocation this method would throw an IllegalStateException.

Closes #5347

The namespace modifications that happen in this method can
only be run once. On the second invocation this method would
throw an IllegalStateException.

Closes apache#5347
@dlmarion dlmarion added this to the 3.1.0 milestone Feb 21, 2025
@dlmarion dlmarion self-assigned this Feb 21, 2025
Copy link
Contributor

@cshannon cshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes here help make this more idempotent but I don't think it's truly idempotent because of how the cleanup is handled when deleting the old namespace paths.

With the previous code it fails if there's any existing namespaces, and the change here will now skip the code if it detects namespacesData length is not 0 which is an improvement.

However, it's possible (unlikely of course but still in theory could happen) that the upgrade could stop/fail after inserting the new data but before the deletes process. So I am thinking we might need to check to see if we need to delete the old data even if the new node has been inserted and the size is not 0.

@cshannon
Copy link
Contributor

This is the loop that I am referring to that may not be executed even though the insertion happened here. Maybe zookeeper went down at that exact moment etc.

Comment on lines +137 to +150
if (namespacesData.length == 0) {
List<String> namespaceIdList = zrw.getChildren(zPath);
Map<String,String> namespaceMap = new HashMap<>();
for (String namespaceId : namespaceIdList) {
String namespaceNamePath = zPath + "/" + namespaceId + ZNAMESPACE_NAME;
namespaceMap.put(namespaceId, new String(zrw.getData(namespaceNamePath), UTF_8));
}
byte[] mapping = NamespaceMapping.serialize(namespaceMap);
zrw.putPersistentData(zPath, mapping, ZooUtil.NodeExistsPolicy.OVERWRITE);

for (String namespaceId : namespaceIdList) {
String namespaceNamePath = zPath + "/" + namespaceId + ZNAMESPACE_NAME;
zrw.delete(namespaceNamePath);
for (String namespaceId : namespaceIdList) {
String namespaceNamePath = zPath + "/" + namespaceId + ZNAMESPACE_NAME;
zrw.delete(namespaceNamePath);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nearly identical to the code suggested by @keith-turner in #4996. I replied with #4996 (comment)

In short, it is not safe to assume that a non-zero length of data here is a complete and valid mapping.

If this fails, it could have failed for any number of unpredictable reasons. The current code behavior will catch it and force the user to deal with the unique and special circumstances that apply to them. This change, however, makes a lot of assumptions about the current contents being the result of a very specific situation that you encountered due to another bug in the upgrade process. That situation has already been fixed. If this is encountered again, it is likely a completely different scenario, and you can't make assumptions about that new scenario and why it failed.

It's nice to try to make the upgrade code more idempotent, but it's very hard to do that correctly and safely. We really need to make sure the upgrade works without failing. But, if it does fail, it's more important to detect it, and troubleshoot why. We really shouldn't be making guesses here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants